Improving Image Spam Filtering Using Image Text Features

نویسندگان

  • Giorgio Fumera
  • Fabio Roli
  • Battista Biggio
  • Ignazio Pillai
چکیده

In this paper we consider the approach to image spam filtering based on using image classifiers aimed at discriminating between ham and spam images, previously proposed by other authors. In previous works this approach was implemented using “generic” image features. In this paper we show that its effectiveness can be improved by using specific features related to the graphical characteristics of embedded text. The features we consider are derived from measures which were proposed in our previous works with the aim of detecting image obfuscation techniques often used by spammers to make OCR tools ineffective. An experimental investigation is carried out on a set of images taken from two corpora of real ham and spam emails.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Image Spam Using Image Texture Features

Filtering image email spam is considered to be a challenging problem because spammers keep modifying the images being used in their campaigns by employing different obfuscation techniques. Therefore, preventing text recognition using Optical Character Recognition (OCR) tools and imposing additional challenges in filtering such type of spam. In this paper, we propose an image spam filtering tech...

متن کامل

Fusion of Text and Image Features: A New Approach to Image Spam Filtering

While enjoying the convenience of email communications, many users have also experienced annoying email spam. Even if the current spam detecting approaches have gained a competitive edge against text-based email spam, they still face the challenge arising from imagebased spam (image spam in short). Image spam normally includes embedded images that contain the spam messages in binary format rath...

متن کامل

Image spam filtering using textual and visual information

In this paper we focus on the so-called image spam, which consists in embedding the spam message into images attached to e-mails to circumvent statistical techniques based on the analysis of body text of e-mails (like the “bayesian filters”), and in applying content obscuring techniques to such images to make them unreadable by standard OCR systems without compromising human readability. We arg...

متن کامل

Embedded-Text Detection and Its Application to Anti-Spam Filtering

Embedded-Text Detection and Its Application to Anti-Spam Filtering Ching-Tung Wu Embedded-text in images usually carry important messages about the content. In the past, several algorithms have been proposed to detect text boxes in video frames. Previous work often followed a multi-step framework using a combination of image-analysis and machine-learning techniques. In this work, we propose a u...

متن کامل

A Sobel Edge Detection Algorithm Based System for Analyzing and Classifying Image Based Spam

Early spam mails were only text-based, however spammers have moved to more sophisticated spamming techniques that involve images now generally termed image based spam. In most image-based spam, the entire spam message, which could be sometimes text, is embedded in an image of any format. This type of spam emails creates another dimension to the spam filtering problem scenario. Extracting text f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008